Resolving Coordinate Structures for Chinese Constituent Parsing

نویسندگان

  • Yichu Zhou
  • Shujian Huang
  • Xinyu Dai
  • Jiajun Chen
چکیده

Coordinate structures are linguistic structures consisting of two or more conjuncts, which usually compose into larger constituent as a whole unit. However, the boundary of each conjunct is difficult to identify, which makes it difficult to parse the whole coordinate and larger structures. In labeled data, such as the Penn Chinese Tree Bank (CTB), coordinate structures are not labeled explicitly, which makes solving the problem more complicated. In this paper, we treat resolving coordinate structures as an independent sub-problem of parsing. We first define coordinate structures explicitly and design rules to extract the coordinate structures from labeled CTB data. Then a specifically designed grammar is proposed for automatic parsing of coordinate structures. We propose two groups of new features to better model coordinate structures in a shift-reduce parsing framework. Our approach can achieve a 15% improvement in F-1 score on resolving coordinate structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generative Modeling of Coordination by Factoring Parallelism and Selectional Preferences

We present a unified generative model of coordination that considers parallelism of conjuncts and selectional preferences. Parallelism of conjuncts, which frequently characterizes coordinate structures, is modeled as a synchronized generation process in the generative parser. Selectional preferences learned from a large web corpus provide an important clue for resolving the ambiguities of coord...

متن کامل

Processing Parallel Structure: Evidence from Eye-Tracking and a Computational Model

The parallelism effect in human parsing is a phenomenon in which the second constituent of a coordinate structure is processed faster when it parallels the first constituent in comparison with when it does not parallel the first constituent. The main aim of this paper is to investigate whether the parallelism effect, which was first discovered in ambiguous coordinate structures, also occurs in ...

متن کامل

Resolving Ambiguities of Chinese Conjunctive Structures by Divide-and-conquer Approaches

This paper presents a method to enhance a Chinese parser in parsing conjunctive structures. Long conjunctive structures cause long-distance dependencies and tremendous syntactic ambiguities. Pure syntactic approaches hardly can determine boundaries of conjunctive phrases properly. In this paper, we propose a divide-andconquer approach which overcomes the difficulty of data-sparseness of the tra...

متن کامل

Sentence Segmentation and Coordination Construction Processing with FB-LTAG

Feature-based Tree Adjoining Grammar(FB-LTAG) can handle linguistic characteristics and various syntactic phenomena of languages such as English, Korean, Chinese and so on. This paper suggests the sentence analysis method that is able to parse the coordination with FB-LTAG. The coordinate processing is based on dynamic constituent decision and feature unification. Furthermore, we built several ...

متن کامل

Why is German Dependency Parsing More Reliable than Constituent Parsing?

In recent years, research in parsing has extended in several new directions. One of these directions is concerned with parsing languages other than English. Treebanks have become available for many European languages, but also for Arabic, Chinese, or Japanese. However, it was shown that parsing results on these treebanks depend on the types of treebank annotations used [ , ]. Another direction ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015